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' / > . ' • * ,1- Introduction, 



Historic&lly speaking, aptitude testing has been a major factor in 
manpower management.. since >/or Id War I, when the fir^t large scale' us^ of 
aptitude- test ^.ng helped to mobilize military personnel. Sinoe ^hat time, 
"the aaeasurement of aptitudes has. occupied a central position in such ^ 
activities, as personal counseling, educational planning, vocational 
training, and ^areer^d academic Selection and placement. Tests have 
received e^cferemely wid^use in the , selection and placement of applicants 
by employers , ^college admissions*off iae^fs , recruiters adminis^trators , 
anti job supervisors. When used for these purposes, tests' are intended 
to benefit both ''Insti tut ions* and ipdivifuals* * The' benefit to the ins ti- ' 
.tution accrues fr6m.^t)^e possibi^l^ of improved accurac/ of selection, 
^it?., minimizio^g the ^numbe^r-^f^ applicants selected or_placed owho will 
^subsequently fail to p^^m adequately . Thus,, the institution is less 
likely to waste valuable/resources to train individuals who are nat'> / 
likely to benefit frouj, thp. Similarly, individuals are thought to 
benefit,' in that ^osfe^wljogje probability of adequate performjance is not 
great ar^ not/ admitted, *fhus minimizing unproductive ef fort , and resources 
by these individuals aud'alteo sparing them the personal traiima of failure. 

Howevet, whil§ mahagement may see fche use of. tests. as an eff^-dient 
way to channel talent, others often viw thp "gatekeeping" function ofr 
tests as a Barrier to economic and social advancement. In the latter 
view, tests are ^threatening to-those requi^red to take them apd a deter- 
rent to ^ the upward mobilit}^ of those whose performance on them, is non- 
competitive. In a "high unemployment economy, j'ob availability is likely 
to* be restricted to those having even higher test performance. Thus, 

visibility^ of tests ^and perhaps the hostility toward them, is j 
more prevalent (e.g., Byham & Spit?er, 1971, pp. ^14-38; Griggs vs, Duke 
Power. Co .> I971) . ^ ^ ' v ~ 

/■"yTest developers have the responsibility of ensuring theif measure- 
ment instruments function as barriers to thosa^unlilfely to sucM^ed'in' 
the selected tasks rather than those merely socioeconomically ^different 
-from a normati^i group. Identifying potential sources of test bias and 
prescribing remedies is still an open issue among test^ developers This- 
retx>rt reviews the basic sources of test .discrimination against minority 
ethnic -or cultural subgroups, identifies sociolinguistic bias as an 
issue receiving little attention, proceeds^ to ••develop ^and explore a method 
fpr identifying sp^ciolinguistic bias ;Liw tests, and then provides general 
guidelines for construction of selectfio* batteries fpr use by the armed ^ 
services. , • - ; , ^ ^ 

-^--> L:'^ 



2\ Bias in Measurement * ^ 

• - < 

•Dissatisfaction with^ tests is particularly great when it is' noted 
that certain groups are coiisisteiitly* less successful:; ^S^e ethni^ 



.gtoups do better than others oh- tests of Verbal ability (Anasta'^i, 1958, 
pp.* 505-571) ; women are said, to be handicapped on tests that require 
experiences more commonly available to males ^(Tyler, 1965,. pp. 243-251). 
Blacks, women, :an,d those for whom Qiglish is a second language all compete 
increasingly and'^sibly for jobs and professional' standards set by the 
traditional job-holders of America--White men in appropriate_^ge' ranges ^ 
(U.S. "Department of Labor, 197f3) . G^Lven this "situation, it is reasonable 
to Qsk v^ether a low scorfe truly forecasts low performance and whether 
the score difference is relevant to the pqrposp for which the ^)erson is ( 
to be employed. Furthermdre, it' is impottant to asceritain. whether it is 
some temporary and easily remedied dis^vanta'ge of lilnority gtoups that 
accounts fo^ the lew srcores tha^ effectively exclude them from sought- 
after:^ positions. • ; ' ^ ' > * \ 

Large between-grotip differences in aptitude test% performance have 
.been noted for 'more -than 70 years .(Cronbach , 1975), and 'the source of ^ 
these differences has been a topic of debate for nearly as lor\g. How- 
ever, only within the last- decade has the relationship of group member-^ 
ship to aptitude measurement become a legal ahd social issue. Recently, 
'the controversy has captured the at,tention of an inpre^sing number -of . 
measurement experts who are ditectin^ careful thougfit and- considerable 
effort * to the problem. * ^ ' ' ^ 

/ . ' I ' ' « 

2.1. Factors Contributing to the >Def initions of Test Bia^ - ^ 

• ; : , ' ' ^ ^ ' - 

An important assumption often made in interpreting test scores^ is 
th^t given reasonably comparable exposure to the culture, differences ' 
In performing reflect past differences -in response t6^ that culture. , 
Furthermore, it seems reasohabl^4 to expect these differences to contintre 
and to influence future job pe^rformance (Canady, 1971, pp. 89-101; 
Samuda, .1975, pp. 42-50)1 Tl/e premise of comparable exposure to a. 
culture, howe^ver , ^ may be untenable. In fact, ^ere are those (e.g., 
Samuda, X975, pp. 63-100) who believe that different groups (men and ^ \ 
women,. for example) are actually exposed to .different cultures^ The ', *V 
appr<?priate question is whether the resulting grdup differences in test 
•scores are relevant to job* performance. The^e differences may or may not 
properly ' reflect subsequent job performance, depending on a v^de range of 
circumstances. Further studies relating groi^p differences in test scores 
to on-the-job T>erfarmance (e.g. ^Bray, 1972; Campbell, )Pike, & ?laugher, 

1969)^*are dlearly needed.' / ' . . ' , 

* • • ^ 

The objective identification of . test bias parameters requires 
conaideratioti from more than a purely psychometric persp^tive. An 
~eariy"eff or tT imd'er taken .by* ^ American P^ycTipr<rfic'al"Ass^^^ 

task force X1969) to identify and define sources of bias^in employment 
'practices attejapted to consider all aspects of the employee selection 
ani promotion processes. These aspects include reception facilities, . \ 



employer attl^des, aptitude ^testing, lirteirvlew prot»col9v biographical 
data*) and performance evaluation methods. The iiaslc concern was^. the 
possibility of liiadvertently introducing bia? atyv^rious stag^es of the 
prOce&s , from the* preliminary screening by the r^^ceptionlst ^to ^ the final 
decisiotl made by the 'personnel director. ^ . \ 

The basic^ recommendation ma^e was that validation of objective data 
'should be under;taken 'whenever possible to ensure that the information 
^needed to make personnel , decisions is both availably and appropriate. 
The conclusion reached was that statistical validity as it affects the , 
evaluation instruments, is t\ie most Important factor \±xi determining the 
presence of "bias in the selection process. Thus selection for employ- 
ment or promotion should be miade on ^the basis of aQ la^y objective, 
visilld indicators as* possible. ' / * , 

A number of court cases^(Ruch, 1972) have provided ^quasi-l^gal 
descriptions of factors' that ma^vdefine test bias. .Cases have included/ 
1(1) those in which the prediction equation observed for minority groups 
is different from the equation computed for the general sample on which 
the test was validated and (2) those in which the percentage disqualified 
\>y the test is larger for minority groups than for the ge^ral validation 
sample^ In bne view, the existencej^^f differences between' the mesm test 
scores^of xacial or ethnic groups (leading to different proportions being 
selepted) is ptlma facie evidence of 'bias. In this view, the burden of 
proof is op:^ the user to establish the validity of _the predictor. A« more, 
'recent Supreme , Court decision . (Washlctgt on vs. Davis ; 1976) 'denies that 
prima facie evidence can be est^blishe^d merely ^n the basis of differen- 
tials in hiring rates (which may be associated with ^if^^^^ncee in test 
performance). ' * ; 

Cieaftry et.al. (1975) have examined the ^assumptions and technical' * 
problems i^elate^l to the^use of aptitude measures in personnel decisions, 
making special reference to those aspects, bf tesi^t bias and fairness 
addressing tdst misuse, test score misinterpretation,^ and the measure- 
ment of multiple skills. They 'view the issue of faimess-^hich- 
generatlly pertains to test use> not test * conteht— as a problem common 
to both minority groups and the general population".^ The 'concept of 
fairness depends uppn ^ number of factors^, tKe major one being the 
responsible professional's knowledge of the strengtha and weaknesses 
of the^ test and the appropriateness of particular applications. In , " 
this view, both and fairness are moire strongly related to predic- 

tive (criterion-related) validity than to. any other factor: The higher 
•,the Validity, die more fair the test (or other measure). This state- 
ment also holds crue when separate regression equations are generated 
to_accommDdat e_two_or_mQre_.grjDLUp8_jIjn_t^ CLeary_ „ . 

et al. (1975) and Reilly (1973) describe situations in which, 9ver- or 
under^prediction results from'^an artifact of* the population^ distribution: 



when two groups, can^be assumed to come f roii the. same general bivartdte 
population, the predicted performance iising a commori regression line can 
be expected to result in ov^r-predictkon for the group at-jt4ie-b^tto9i of 
the 'distribution when compared with prediction resulting from a separate 
equation ci^mputed i or that group. Conversely, the performance of those 
at the top of the distribution will be una^r^predicted' to some_extent. 
fhus, if 'Some idfehtiiiable group" occupies a, partiqillar area ^t eitjier 
,end of the distribiition of a*^ sample sharing a common prediction equation, 
there will be a tendency to under- or over-pr^^ict performance^, depending 
upon its rank in the distribution. Flaugher (1974) substantiates this 
fact, citing 'a number of studies in which the ptedicted ^performance. o*f 
minority group members was better than their actual performance when a 
regre's^ion equation'^b^ed Jon all groups \as used>\ 

Other definitions of test bias have been advanced 7bi?^*Thomdike 
(1971) atiJ Cole (1973), among others. Thorndike indicates that .eVeji > 
when 'validities are equal, tests may be unfair to \lower sdpiiflyjfg groups 
in the sense that^ the proportion who qu^].ified on the test can be \ 
smaller than'the proportion qualified on the job. \* \ 

The- use of the proportion who qualified versus the proportion <iho 
would succeed on cthe job seems to be a reasonable standard for detertnln- 
Ing the presence of bias. However, Cole (1973) advances the view that' 
given on§ member of the. majority group a^cj one member of a minority \, 
group, both of whom would succeed if selected, fairness requ::^res that' 
each have the same probability o£ being selected. » 

^..^ -/ ' , > 

. It should be noted that , these"^deis of bias, including^ the purely' 
statistical models, contradict each othfer in particular cases. In, fact, 
Peterseti and NoVick (1976)- point out that onljj^ two of the seven* models-, 
they reviewed were internally consistent with r^spec^ to their logical 
converse?.' 'Cronbach (1976) suggests that, at the. least, psychometrics 
can help lawyers and .philosophers to "put more substantial arguments 
behind competing Vules for obtaijnitlg eqUity" (p, 41). ' ^ » 

' " ^- : I ^ ' ' 

2.2. Proposed Remedies, for Bias ' ' , > . 

. • Three remedies for bias, that have been sugges'ted are (1) tli^ elimi- 
nation of testing, (2) the differential interpretation of test scores 
for different groups, and. (3) purging the tests of sources o^ bias. 
The first remedy has been, suggested in equal opportunity guidelines-^ 
(EEOC, 1970)s These guidelines imply t;hat testing '^s inappropriate 'when 
the following conditions prevail: ^ 

(1) Validity data- are" neither 'avallable^nor-belng^aollec ted; 

.(2) Promotion or selection , procedures h4ve adversely affected 
minority groups* ^ • ' . ^ 
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Fortunately, the tests' used by the armed' services have', in general,- - 
been subject to gbod validity resear</h. The availability of many 
.incumbents has permitted repeated validation in a prariety of circum- . 
stances • 'The o.nly apparent 'insufficiency— one> that is . universally . 
common to validity researoh in all sect^rs^-i's the reliance on success v 
In training, instead of on-the-job pecf prmance , ^s the criterion ^of 
.Success. However, adequate on-the-job. per fom^nce* measures generally do 
not exist, ^and training success may be more important since inability to • 
complete , training removes the opportunity for on-thB-job performance. 

, sedpnd remedy, dif f e^reritial interpretfation of test scores, 
might be achieved by adjusting the scotes of- minority group members 
who are adverrsely affected by test use. An equivalent procedure ihyolves 
making qualifying scores cpritingent on group membership. Other related 
procedures 4iave been suggested* also (Cole, 1973; Einhotn & Bass, 1971; 
Guion, 1966; Peterseh &' Novick, 1976). In practice^ these procedures 
have often been used by universities wanting' diversity in their student 
bodies * The .mQdif ication of adjnlssions standards' for ^nority group 
members has on several occasions, however, resulted in legal attion 
against universitiels. (e.g., Bakke vs / Regents of the University of 
California ,. 1975; Ginger, 1974). The ethical, issues involved in imple- , 
menting diff^r^nt personnel processing procedures for different population 

,^subgrdu{)s are complexTAnastasi , 1968, pp\ 2^0-286'; Darli^^gtdn, 1/^71; 

^:Kifkpatrick, Ewen, Barret, & I^tzell, ^1968, pp. 3-120.' ^ 



The third, alternative approach that has ^been. attempted, is the ^ 
^'ile^elopment of so-calleA. culture free or culture fair tests that ar^^ . *. 
vkiid predictors of job performance*. The logical ■cor\seqyence of this 
.\^d£cept~culture fairness— is that the average scpre of each Subgroup 
^./v^iil b^ the same. ' Hoveyer, no such contant has yet, been 'found that will , 
I? yijeid this result. Furthermore, • the rftcoid to datB strongly suggests that ^ 
^^'^e search fdr coHlpletely culture fair content is not a promising .activity 
' ,(Aiiast.asi; 1968,. pp. 280-286; Dyer, 1^60; Lorge, 1953, pp. 76-83; Tanhenbaum, 
r^4965,'pp. 721-723). 'while complete culture fairness may*. not be prob^le$ * • 
ilniiting sources of bias such as -langu/ge usage may iimit~cultural bia^s 
iiil^tests which 'are* otherwise valid, instruments . 

• . *. ^ * 

• ^' ' 3.* Rationale for Investigating the Application of 
' Sociolinguis'tic Principles to Tegtings ^ 
• . : • . r, * ' ' • r • • 

^ Bec^^use of it? size, the militaryw establishment is dependent xm, 
easiLy a^ftin^'^eretKasse^smejit devices foi;, the evaluation, selection^ - 



-and placen&l^-of -personaei, particularly- enlisted^ persanneL.:l\_The__,_ 
devices u^, and "indeed massively used, are group administered, .multiple- 
choice, objective, machine scored* aptitude .tests . Indeed, the advanta-ges^ 



\ 
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Of 8uch>te*t8 are so appSreut that their use hasJ^also become peryasive 
in Am&ricaa^ industry and educat 

' All youths who seek entry ^ enlisted personnel into^ any military 
service take ixlitial selection and subsequent classification test 
batt^ties.- TheHnfluence of the "batteries is obvious: The^strenfeths 
and weaknesses oi military personnel tests affect the careers of'^ 
large segment of Aineripan nouth • The development of techniques that 
iniprove the objecti^ty 9f military ..testing by reducing inadvertent^ 
variance due to linguistic ♦structure, or other unihtentional com- 
plexities should have pQtent;ial applicatiori|to aptitude testing in , 
general • * * ^ ' ^ ^ . 

• The preseAt paper suggests that the developing body of socio- 
linguistic, researcH ntLght lead to the 'formulation of principles that 
could be^^ed syst^atically to improve th'e language aspects of tests • 
The tactile adopted the present work was -for professional socio- ^ 
linguists £0 analyze- a sample of existing cognitive tfest material, 
Identifying possible probleips and peeking to jdetermine Uie feasibility 
of fopmlizing sociolingtlistic principles of test item language. At^ 
a loiter time, ^ use of the resulting principles mights-help avoid language 
problems in future development of armed services selectibn batteri-es « 
The principles' developed in this paper^, however, should not be uncriti- 
cally accepted -and applied without" rigorous investigation ^to determine 
ef fectsk^pri test reliability and' validity in the teat-taking population 
in general and in, ethnic siib'groups in particular • ^ 

The sections' 1|p<^dtately following present spme ideas about the 
pofejitial-eontrijbaj^ions of sociolinguists to test construction. The 
major*" purpose of ^jcfiis ^f f ort is to provide a theoretical analysis 
useful iii* a^sj^s^ng the feasibilityVpf applying linguistic concepts, to 
testing. Meritl^oned are* several approach es^ to (1) the systematic ^ * . 
foi^lation or principles heretofore only ' informally stated and applied 
and (2) the/ldentitication and adoption of new principles^ of test 
constriictlon. * • \ - , , . -^v ^ 

' * 4 . Soc iolinguis tic s ' ^ ' , . 



i 



r 



HoWpis soclotelnguistic research relevant to ,the' construction and 
interpretation of l:ests? ^ ^ .' ^ ^ ^* ' * - V 

^ the pad^ 40* years , a considerable tody of research has aceumu-^ . 
la ted on the varieties qf American^ English, v Such lan'giii&ge differences / ' 
Reflect differences in the composition of society. Glearly age, class, 
ethnic ^roup, sex, and g®£i^^Pij^c^l location all condition the language 

a particular indiviiuai^^'^'T^s conditioning is, in tum,c 'affected by ^ 
the setting and purpose of' any gi^^en language exchange. Tfie 'jpature ahd v^; ^ 

. ' . , . * . , yd:, w * . ; . - . 
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variety of American English that an individual eB5>loys and the facility . ^ 
with which particular varieties are used are functions of the user^s 
socialization and^ p^i^onal history, V . . 

It should* te noted that each variety of American fhglish has its . 
own degred of a^p^ropr lateness to a particular situation, gach of the 
se^veral Ways of^^Lting^ someone to dinner. ("Hav^ you eaten yet?", 
"Do you want to eallfe', "Are^youliungry?", "Can you stay for dinner?", 
"Ypu^a^e-cordially Tnvited B ^ip^n us -for dinner," and '>D*j *eet?") ^ . - 
is appropriate for a given bccasion>-^In addition, there are levels., 
ancl kinds of laijguage appropriate for s^ken,* as well as for written, 
^'language. Such a view Is cbatrary to earlier judgments in which language 
*^ ^^s presented in terms of a simple dichotomy, 'the correct versus the 
Jticprrect. The more recent view rejectb^ a Wngle hierarchy of , 
•7 langijage levels— the kind of ladder that places^the formal or literary 
at^tlie top, the informal and colloquialpfci^th^^ and the vulg^a.r 

^,:^^ p^t^±t^xate, at the bottom, father, /Lfc reEognizes such categories eis^ 
''-^ f^^iar and formal language .as appropriate functional varieties; 

h'^' The pluralistic nature^ of our social and educational structure 
-.^^^ems almost to defy language classification. ^Clearly, a "standajrd"/ 
jli^ no^tandard" dichotomy does not seem* adeq\iate to capture the richness 
. r o£' a ^multidimensional language like contemporary English, nor does the ^ 
value judgjient implicl^t in such a dichotomy' seem warranted. Nonetheless., 
it ia^ true that those varieties of American English most often used 'to 
.'^ /^^Snicate .formally in public settings, or to converse with non-intimates;, 
'7 n^at dne end of a cojitinuum* At the other ^en4 are those "nonfe-ta^^^d" ^ 
. varieties which are used in less fprmal communication among intimates^ 
iTpe of age is also -correlated with the educatioiidl background -^of the 
^eaker, with^more educated speakers tending to prefer the formal, . 
itandard variety. Informal or nonstandard usage -by educated speakers 
^'Vould be placed near the middle of the continuum. . , . 




The language used in most tests is drawn almost: entirely from iiie 
formal range of the spectrum-. Furthermore, test language tends to 
ireaect written rather than spoken usage* In particular, this variety-^ 
lormal written— involves the use of complex sentence .structures and 
vocabulary elements rarely found in the spoken language. But test 
tajkaps differ with respect to previous exposure to formal standard 
'imiguage. Those who in\heir social environment have had less exposure, 
to,, this variety will tend to have Correspondingly less' facility in 
^-^p^aking, reading,' aAd'writljig it.^ This' situation doeS^ not imply 
. €h^t the cognitive, capacities of such speakers are limited. Indeed, ^ 
/ the virtuosity e:diibited by some individuais,4<n,. their use "of nonstandarci 
language forms requires a variety of Itngt^^^^kills . * 

A hypothesis advanced in^ this paper^i^ that the less^ exposure an 
individual rffs had, to the language typically .used in tests, the greater 



will be the lingulstfc difficulty encouritered in taking the test, '.One, 
would therefore expect the level of linguistic difficulty to be gteate; 
for tho^e who typically employ nonstandard varieties of English or who 
come from environments where English-is not the primary language*. To 
the extent that these individuals are able to use the language of their 
'own cenvironmehts 'effectively, one would expect effective- conjnunication 
4n^new situations when given the opportunity to learn the linguistic 
demandsf^t these situations and to practice skills needed to meet these , 
demands . ^ ' ' * 

\ - * * 

Socio linguistics, then, deals with the particularities "of the ' 

interaction of language type and social experience The evaluation 

of language correctness and the pTescription of linguistic etiquette, ' 

however, are not proper functions of sociolinguistics . As 'a social 

science, sociolinguis^tics does aspire to a- systemati^^^^understanding of 

the interactions between subculture, language Variety, and language 

comprehension. It^ia' anticipated that ;the' application of socio^ 

linguistic analysis an'd research will provide ^another perspectivb on 

sotne of the problems associateil with the language of tesEing. ^ * 

The present, report does not promise a- comprehensive treatment 

• of testing problems from the point of view of sociolinguistics. Its 
purt)ose is to" show by examples how a sociolinguistic^^ applicattion might 
be approached. An obscllete military selection t.est battery .will be 
used as 'a representative arid illustrative example. Accordingly, the 
discussion ^cusses^on*^ several areas in which language-related concerns A 

- are appropriate to test* construction, administration, and interpreta- 
tion. The ensuing discusgion includes-: • '^i''' ' 

1. An examination of potential nonskill-related difficulties 

, arising from language differences. " . • 

* 

2. ' A consideration of test directions from a sp.cio-linguistic 
i viewpoint^ ' 

3*. A statement of four sociolinguistic principles, for evaluating 
^test items and directions. _ ' 

^ - 4. " A critique^ of the synonyms item ty^pe'. ^ • 

\ The use of this strategy is riot intended to convey a negative * image of 
I military tests. 'In fact, the relatively minor^violations of principles 
Jji ^the test itepis chosen to illjustrate points makes our examples .seem 
at times somewhat* labored. ^Many of the principles, therefore, might 

• be more properly applfed to testsr andHtems containing nfore flagrant 
violations. - 
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5- The Language of Directions / ^ ' 

■ J I 

In any test; battery, it is, important that the test directions \/ 
establish a common frame of reference for all the test takers, 0nl/ 
then can difference^ in individual' performance, be attributed f o ^iJtf er- 
,ences in the, skill tested* rather than to iif^dequate ' test direct4oi^; 
Orally administered di-rections are, the informations-bearing test ^.ements , 
for which it Is easiest to infer equal examinee exposure, 'But, /in spite 
of oral directions and th^ numerous pieces' of. clarifying infor^tion they 
coiivey, the assumption that the directions establish a commohAaseline 
should be seriously ^'examineST* ^ yv ' 

Since dire^ctions also serve as introduction to the tes/, some 
attention must also be focused on the setting and the atmo/phere they 
create. Both of these conditions should conyey the intention to be 
reasonable and helpful . 



5.1., Read aad Lis ten 



I : / 



The directions of the example tests wiere presented in two language 
modalities: the visual (written directions) /and the ^ural (directions 
read aloud). Alpaost all ^directions' are read; aloud by the test*super~ . 
' visojr t6 compeiixsate for possible deficiencies in examinees* reading 
ability. This . strategy is needed to ensure comprehein^ion of the infor- • 
ma^ion by all participants because' thfe general .dire^l^tipns , ."as- well as 
those in separate subtests, include faitiy/lorig Jahd. detailed passages. 
Iiji fact; they were longer and more deta^-leH \hap any of tii'e test items., 

T^e- variety of English with which the examinee ;ls familiar mayywell- , / 
^ondit?.on his ability '^to understand another variety. Exandnees \Aio have 
reading -difficulties may also be relatively unused to reading or hearing 
formal English Jf the kind "found in the sampl^ *tests . In this sense, 
the test gives an advantage to" those social, economic, or. ethnic subgroups 
^ , * who are, comfortable with the type oi language "used in the test. Although 
it is ho'fc feasible to develop directions to whioji every examinee is 
accustomed^^h^re are a-number of language modifications/that might be 
helpful, "sfpie of these are given below; others are discus'sed under the 
principle presented in Section 7. ^ / ^ 

.First of alii the examiner might be given more leeway in helping t\\ose 
* ^ who .do not understand what they have Heard. Indeed, the initial instocud^ 
^^'tions in the example tesx strongly suggest* that this should be done.' 
The ^aminee reads; "Listen carefully , to all lirections^ If they are 

' • ) , A ' ' ■ , 

Since there is a relatively common^ problem, of^ being too explicit In 
communication events, achieving clarity is not as simple a, matter" as 

^ / may be assumed. - Giving mor^ ^^^^'^ '^^ necessary ar giving it v ^ 

more often than is necessary vI^C^s Grice^s (1967) Principle of 

% Cooperation (i.e., that the^an^age used follows the accepted purpose 

or direction of the language^exchang'e in which- one is engaged). 
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not perfectly clear, raise'- yojir hand. It is ^ery ^pbrtant that you 
understand all the directions\thoroughly . " This instruction leads 
the exaianee to expect thajfa ^jaquest for clarification will be met with 
an additional explanation". However, if a question is- raised, the 
administrator* has been instructed .to answer it only by reading the^ 
instructions, a procedure whfcli may hot be, adequate if the problem _ 
is one of comprehension ^rather than hearing . • Perhap3 a set of alterna- • 
tive responses to frequently ^sked questions could be developed and 
furnished to test administrators, t , ' . 

JuJ. Patterns og^Jlepetltlc^n 

, ' ^ \ 

■ Four "information presentation patterns are found in the test battery. 
Some information is repeate4 on* almpst-every^page, some is reiterated 
for each subtest, and some is found only at the beginning of the 
battery J Other information is specific to some, but not all; subtests. 
The reasons for ,tl>ese different patterns of repetition are not immediately 
obvious. Regardless of ;their purpose, however, their value to non- 
standard .speakers. deserved examination, especially since they are - 
stated la formal, standard stales. 

' Inconsistent patterns of rrepetition. can seriously mislead the 
examine^i. For eSamfvLe , at spheral points in the test the examinee is 
urged to work quickly but accurately In thd first subtest, this 
~ Instruction is expanded wlv^ information about a 7-minute time limit. 
However, nowhere, else in tt^attery is time mentioned, _ The examinee 
'might therefore, be led to assume that since no time limit is mentioned 
for the second subtest, n6ne will- be applied. This assumption is 
clearly Inappropriate in light of • the 10-minute time limit thatf is imposed 
on thi^ test. -The principle Illustrated herfe is that when information 
IsVveni'it sets up an expectation or .response set. . In order toav«d 
unwarranted conclusions by the examined, directions should be such that 
all repetition, is symmetric. Any changes in test requirements should 
by preceded by\xplicit instructions appropriate , to these new requirements 

5,3* Ti^ Supervisor's Delivery ^ ' . ♦ 

The use -of emphasis and negative imperatives to ensure clarity is 
viluable but potenRally risky. Obviously, the directions should^ be as 
helpful as possible in setting the tone of the examination situation. 
Emphasizing negatives an^ placing stress on particular words in a . . . 
sentence, however^ mayr result in an irritating, unnecessarily authori- 
. taridn delivery. Negative imperatives were frequently used in ^e • 
t«8t battery to repeat information, first presented as a direct Impera-. 
tive. As such, they were probably a necessary exp'ansion. In general, 
the Stressed elements in directions to examinees conform to p^tterns^ 
of stress- assignments found in the langi^age as a who^e (Bolinger, -19b/, 



.Crystal^ & Quirk, 1964; W.k;i?V 1945) • However,' the assignment of stress 
in. the directions read iy test administrators is sometimes inappropriate 
in terms of norinal JLdnguage usage and may have undesirable effects. 
Stressing tiie last^part of a co^ound in a Sentence with normal' falling 
intonation is u^uaiial and distracting; yet it is required in- the initial 
instrtfction -giyeV to eaoji of the subtests O^tg., "Turn the page afld-.> "\ 
BE6INI"). Hfi jieat administrator is also required 'to stress a one-word, f 
santWe ("STpPl") at thA^end of.e^h subtest. Such distortions of normal 
8tressvi5att^ras invite the administrator to shout in order to achieve* 
the desire^<^ effect*. In addition', the stressing of "any" in the last test 
("D^o not go back and Work on AlsIY question in ANY of tfie dtjier tests.") 
may be Interpreted as a', threat , .instead of a simple order, by some of 
the mo?:e anxious examinees (Green, 1973; Sadock, 1972)/ 

' ' Directions could be easily rewritten to mitigate the potentially 
authoritarian tone produced by th$se stress patterns. Telling 
examinees to "BEGIN' WORK" ot to "STOP WORK" produces a more natural,, 
less threatjBning intonation. In summary, the principle invoked here is 

/that ahy d/stortion of normal speech irj^^ the test .situation may be 
disconcerting to the test* taker and should be'avdided wherever possible. 
Thfe us^^of a specif ic .Vjariety of English may in and of itself presen^ ' 

^-difficulties for^ the' test 'takers an4, further,^ distortioiis ,of normal 
language patterns may create \diat appears to, be a hostile environment. 

, Insofar as these' factors interfere. with« an accurate assessments of 
what is being tested or ptbduc6 uimecessary antagonism toward the agency 
sponsoring the testings, they should^ tje modified. 

' ■ • ^ i ' ^ \ 
6 . Cultural ^Considerations - < . 

The most subtle j)otentlal for test bias rests^^lA .the unstated 
assumptions, both social and'iingiiistic, of the test constructor. Since 
these ^sumptions concern language' or cultural matters rega^d^d. as 
inherently natural, , sMf -explanatory , and completely obvious, the measure- 
ment «^^pert may be hard pressed to recognize them as matters requiring 
attention. The linguistic example given' below highlights the problem by 
illustrating a language featufe^ that .the native speaker would plrobably 
never question. Instead, he might assume that all languages i^re functionally 
ecjuivalent, that th^y operate within the same frame of r^ereiice and make the 
same kinds of distinctions. . - ^ - 

, * An example of the^^ifd of problem that pose^^ difficulties for non- ' 
native speakers (even those \iho have ^attained ^relative fluency in 
English) is the use of the* article a^. This article bSs both a generic 
reading (e.g., A himian brain is jieavier at birth than is a frog brain * 
She is a Marilyn Monroe .) and .an indefinite, specific reading (e.g., A 
man came :Lnto , the store this mdrning .) (Lawler, 1972); In many test 



items, an object or person is ^irst introduced in the generic sensje and 
later., v^en further information is' added or requested, treated in a 
specific sense* This procedure is prevalent in tests and may be consid- 
ered a characteristic trait of test language • For example, "a man came 
into the 'hardware store and bought a qualt of paint. He also bpught ' 
• • • The prices were . . . Htow much did he spend?" In some languages, 
this ambiguity of the article^a^ does not exist; an examinee whos^e native 
language makes the distinction explicit mighr not automatically equate 
"a man" and '*he,",and so may be cpnfused by the' ambi'guity in test items 
in English, the problems, which do not exi^t for those who speak only 
English — ^but may exist for others — c;an be ameliorated^ by substituting 
proper names or other specific designations for "a man." 



More petvasive. in the test battery, but more amenable to correction, 
are the cultural assumptions that condition What is the "best" answer to 
a given test question* These are most apparent in thode subtests where - 
objective criteria for determining correct answers are either -unclear 
or unavailable.^. The following item, 'taken from a Word Knowledge Test, 
illustrates the point; 

* Potent means . ^ 



A 
B 
C 
D 



heavy 
royal^ 
powerful 
drunk 



"heavy" could not 
the word has that 



31ie examTnee, ask^d to dloose between '*heavy".and "powerful" in finding 
a synonym for '"popent," but who does not know that in formal English 



mean "potent," is at a disad^i^antage, particularly if 
meaning in the examlnee^s own speech. 



JJhile the^relai^vely minor defects in the particular items 
presented above may iiot be especially harmful, the p'oint to be made is 
this; There are subtle differences in the structure of languages, 
both, formal 'an4 informal, that create a potential for the inadvertent 
introduction o^. ambiguity — and possibly bias — to tests.' Careful review 
of test "content by thoughtful test constructors' andj/o^ language experts 
coul(l i^£t>bably eliminate most major problems. 

Values Specific to the Majority Cultur^ 

*The^^act that society places a fiigh value on verbal ability is not' 
itsej.f a problem; deciding which aspects of verbal aJ)ility are impor- 
tant^y •hpwever> Is^ a problem* The example i tesjts ^ heavy dependeinc^ on 
voEabuiary items reflecting an extremely .formal style (Shall I inform 
Ctoss the roaclv^th cautidh Q implies that knowing i^rds of. this 
is lof . prinie concern. In addition, 'the stimulus item fs typically 
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a more difficult word than the correct response. Proceeding through\ 
the Word Knowledge sid)test, tl^e examinee becomes increasin'gly aware or 
the. examiner* a tendency to use f orm%L , words as item stems and more commoi 
ones as .alternative responses. Although this lack of symmetry may be \ 
perplexing to QOme examinees, it ^ actually intentional. "Hie use of •\. 
altei^iative responses that are more likely to be known by all examinees 
helps to ensure -that .^incorrect responses lesult from unf amiliarity 
with stimulus wor4s, and not 'with response 'alternatives. 

In several instances', test items may penalize particular subgroups 
of the test-taking population. 'The word *f eat , meaning an accomplish- ^ 
ment showing unusual skill, is illustrative of a particular type of 
•defect. A Spai\ish speaking - examinee misreading this' word as fete 
(festival) ox frying to relate it*, to a Spanish cognate may, mistakenly 
choose the word celebration as the correct .answer. This examinee 
appears* therefore, to be penalized by attempting to exercise a pro- 
ductive and useful bilingual skill. It is likely that this /item may^ 
indeed fulfill the purpose for which it is intended — discriminating 
between examitiees who knoW the word's meaning and those who do' not. The 
point, H^l/ever, is that, in the face of uncertainty , some feature 'of the 
*examinee^*s lafnguage ot culture may determine the attractiveness of 
alternate choices. * The example giyen here suggests th^t a non-Spanish 
speaking examinee who does not know the meaning of the word fete mights 
make a random choice, thereby, having- a 25% chance of correctly answering 
the item. Spanish speaking examinees, on the other hand, ^ght mox;e 
frequently employ the bilingual skill mentioned above, choosing a , . 
particular incorrect alternative, celebration , more often. »When ^ 
at t enp t ing >to* devise plausible alternatives or multiple-choice items 
\|3^'st item writers should exercise care in brder to reduce the possi- 
Lty that. specific aiternatives are not differentially attractive to 
je subgroups defined by common cultural or linguistic chfiract^ristics 
Standard item analysis procedures could be used to empirically assess 
possible "^differences.. ^ ■ . r> 

' \ * = , ' . ' 

6.2\ Other Particular Problems . ' ^ 

Another pptentially troublesome situation becomes apparent whea-^ 
one realizes that most words have several possible^s^metjjifes^dij^gent , 
meanings. The implications of multiple meSnlrrgs-^^an^e shown by refer- 
ring to. four.wo^ds found in<,;i^e Word Knowledge, subtest. 

According to Webster's Third International Dlcfatonary , the word 
ample is defined as : - ' * • . * 

' . - . . * r 

1. Marked by extensive or more than adequate j^ize, volume space , 

or room. ' . ' ' I 

2. Buxom, portly. 



In light of these. definitions, two of-th^ alternative choices, fat^ and 
vell*-shaped , might be coi\aidered aa 'defenslbl6<^ choices. - Well-shaped 
might be chosen by, aiv examinee vhose' subcultura considers portliness 
to-1>e a physi«cally attractive^ quality 

Likewise, an archaic definition of scour Oeat, punish")' recorded 
in the- same dictionary might make the choice of yAiiy acceptable.- 
Similarly, one definition^of sullen C'of a dull color, of . somber hue") 
could possibly make two of the choices, grayish yellow and Very dirty . 
•seem reasonable. A closely related problem illustrated by an 
testing the; meaning of terse ; defined in Webster ^s Third International \^ 
Dictionary as "smoothly elegant: polished, refined" and "devoid of - ^ 
superfluity: brief, concise." Although the key.ed response^ pointed , 
is the best' chpice available for terse, it is not ap obvious • synonym 
fdr. either of Webster's definitions. * • , ^ 

Granted, the problems' illiiSst rated are not severe in the sample test,, 
especially since the instructibja^ direct the examinee to select ttie b^st^ 
answer; However,, one must ask the question, "Do vocabulary items with 
these types ^of diatractors represe4t^^ most effective approach to * 
measuring vocabulary or verbal^ abilityl"^ Are these kinds of word discrim 
InationSj which may. in fact have a spurlou? -Attractiveness' for some sub- 
groups, CKe best choices whicji could beinade if viewed from a ^semantic or 
linguistic perspe<^t|^e? . " ' 

6.3«* Errors of Omission , . \ 

' In cons,tructing a test such! as ^Arithmetic Reasoning,' test writers 
• typically use examples which, they ^ssume will reflect the everyday 
experiences of most examinees. Ifi' doing this, however, t|ie tester may 
exclude useful tnaterlal. It seems appropriate-, .therefolfe,-^o examine 
test laaterials to determine what the examiner ^^may have obdtted as he 
tried to select only conpon material. ^ 

^ The sample test's failure to reflect the diversity of the population 
ie&isxg the test illustrates the .tendency, fori omission. Person^ named - 
to^the test are called Tom, Bill,' John, or Jd^^typical white, middle- 
class nam^s. The Puerto .Ricaa or Mexican-Amejican finds nothing in- 
the test that acknowledges the existence of his culture i H^omen ^re , 
conspicuously absent also, even in traditionally female situations s^ch 
as purchasing food and clothing. This -practice certainly avoids 
stereotyping but at the cost of ignoring' women cQmpl^ly. Attention 
to such details might well lead to the incluilon Of a greater variety 
of mateisial-Tmaterial that would produce a more appropriate- balance of 
cont,ent with no sacrifice..in clarity or reasonableness-. Ev§n minor 
revisions might have a beneficial psychological effect on mitiorities 
or pultural subgroups. f ^ ^ff^^ 
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" 7. Formulat£an of ^ome Sociollngulstic- PrtncXpleb 

•As indicated' in the «Introdiction, this report predicates tlje 
\ potential value of-sodiolinguistic prlnciplfis formulated with test 1 - # 

construction in 'mind • Because such ^principles ^e not rea4ily apparent ' 
. . from the* examinatiori of Jixe* literature of either, sociolinguis tics of o 
testing, active steps are ireqiiired to bring the formulation a^ut, 
( To do this, specialists in various aspects of sociolingulstic study . 
were directed to qse their knowledge of different y^^rie^ties of English 
' and ethnic and mino;rlty value systems in order to predipt potential^ • 

sources of difficult ty in the test battery. Thesje speciallats were 
' ' chosen fdr their work on language differences in American English,' 
including standard and nonstandard regional variations, and for thajfrr- 
research experiences with the problems of noh-natlve speakers^ • '^^f 
task was .to explqre the application of sociolingulstic principles to ^ 
V ' two of the sample subtests (Arithmetic Reasoning and Automotive • Infprmation) 
that irely on^l^guage. to formulate individual test qdestibns. ^ 
• » ♦ > * • 

A -judgmental analysis -of these subtests indicated that four specific 
sociolingulstic principles, are ijni?ort5ant both, in descril)i>ng are^ in ^ich 
'minority examinees encounter difficulty and in sugges^ng" remedial action* 
to neutralize these difficulties. • 

7 ;1 . The Principle ol Pragmatics . ' . '4 ^ • 

J ' ' 1 ^ ' z ^ ^ 

The principle of pragmatics states that ^the .values ^Implied or 
> stated, in test items should ^ consistent with the^.valiies of 4:h^ , examinee . 
Mass testing procedures^ often assume that the item writer and the examinee 
understand an item within the same frame of reference. ^The test.con-^ ' 
structor cannot know the v^lue systems of all the subpopulatiois. who wi^JL 
take the test, but a sociblinguistic reviewer^may be able t6 altot- him ' / 
to potential problem' areas. An examiner sensitized in this manner 
could, presumably, avoi^ difficulties arising from differences between* 
examinee values and those implj-ed in test it^stf^^^^ffer.ences'^that us^ally * 
arise from differences in the backgrounds/ort examiners and .examinees ; 
The examples below may help to clarify the .differeflces in values, that' are 
iikely'to be enountered. / ° 



/ 



An insurance .policy cost^ $7.70 a^month,. ^ 
or $85.00 a year. . How much money will a 
pe3?&on''save each year by p^yihg^ for a 
year's insurance at one time? I ' 

A $ 5^00 • - ' 

3 $ 7.40 ' ' , , • 

C , $ 8.40 ■ 
D $92.40' 
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A man paid $150'*"^or-^ a set: o/ ,4*'new tires. 
After using theriy'-for 10,000 miles and 
paying out $8*^'f6r repairs ,-^tie received $2 " 
apiece f ot them toward- a new^et . 
much per 
tires? 



A 
B 
G 

b 



$.0134- 
$.0130 
'$.0168 
$.0672 



How 

e did h^ pay for the set of 



These^items, dealing with' buying insurance and calculating the cost per, 
mile for tires driven over k long distance, presuppose familiarity with 
the allocd-tion of financial resources. This assumption, however, is 
not necessartly r^listiLc fbr examinees from low-income backgrounds. 
For. example, ,low~incQjg^e^examinees typically experience situations in 
vmich credit buying^;i^^ customary. Insufficient income^^of ten prevents 
the ^choice of any other type of payment, making decisions related to * 
s\credit versus noncredit buying somewhat academic. The concept of long- 

• range beifefits, as invoked in the insurance iteiji, may -be cojnpletely < . 
foreign to the lo^-income examinee's ec^onomic frame of 7^erenc;s. To 
those who iiave* internalized the value system of the impoverished, 

these items .pall for decisions that might -be strange. A difliculty- 
with strict application of the prindiple .of pragmatics is that gome 
values and -experiences , as ih work values, may be high^ly relevant to^ 

^the demands of the job environment and hence be, important to the- validity 
of the test item. ^ Care must, be taken to evaluate critically' both sides 

, of the*" issue on an item by item basis. suimnary, the principle of 
pragmatics suggests- that test items should avoid posing sit;uations 
that are uncharacteristic or atypical of the lif&^styles of tefet takers, ' 
espStefcally when* these 'Situations are experienced differentially by 

• variou^^^aminee subgroups and are hot criterion related. \ 

7,. 2 . . The Principle of Processing * • ' 

The principle of processing, reflec^^iixg— tha_aB5ijmption._jAat ^ items 
can be' categorized in terms olc the language and reasoning processes ,they 
requite, suggests that particular item 'categories, or subtests, should 
cont;|in pnly itiems that require the same *process(es) . The -terrf 
"'procjessirig" is related to the test taker's ability to respond app,5opri- 
ately to different typed of information- ordering. .This entails .dealing - 
with siltuations in which the nature of the information giveti varies in 
several significant ways.:-. ^ } \ • 



Seyeral items in Jthe Arithmetic Reasoning Te^t appear to require 
different combinations of ' information processing skills. Consider 
^the following items: . ; / - . 
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:'To'm bou^t IQ^ounds of nails.., 
If^hfe gave 32 pounds to his 

' 'iJr'other , how ^tiany pound^did-he 
have* ieft?^ 



A 140 

B 86 

C 76 

D 72 



This Infi^matlon presentation requires only, a simple subtraction, 
item cari"^- answered" without recourse- to thelanswer choicn^S. \ 



The 



On^the other hand,, an examinee must first consider the alternative 
choices in order to arrive at the expected ankwer for the: followfng 
item: ' I J ^ , * ' 

• An article that sells for §5.00 
^ customer ^S.lO'when the sales 
^* is included • What; is the sales^ 



A 

C 
D 



5% 

2% 
1% 




The correct response, if ansWred using only ^the infonaation presented^ 
in th^ item stem, would be "10 cents," rather. than 2% as requlre^^^by 
the option^ Here the examinee must rely on informa tion given in t:he 
stimulus material and on the amswer choices, since, the question maizes 
no mention of percentages, id addition to the simple calculation re- 
quired, the test taker must also realize that an additional step, 
conversion to. a percentage jEigtire, is implicitly demanded. The 
discrepancy can be avoided by following the test construction practice 
of having a completely self-contained stem. In tke above ^example,' dtdtlng 
the qu^^1rlt>n .as , "what iS the percentage of the sales tax" can. solve the 
problem. Now the examinee can rely on the stem cir stimulus ^material to 
^rlve at thqyM^wer . ^ ' 

Still another set of information processing sKilis is nefeded to 
answer another type of Arithmetic Reasoning item. v \ 



Joe buys 9 shirts -and pays $1 for a tle.^ 
The tot^l^cost is the same as- Bill spends 



r 



when he bbys 4 shirts and pays'* 
hat» If aJLl shirts cost the same 



was the^pride of each shirt? 



for a 
what 



i 



A 
B 
C 
D 



$2 
$3 
$4 
.$5 
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To answer this i-tem type correctly, "the examinee. must set up and solve 
algebraic equations.. 

^ It is Important to-iote that although air the various types of 
it^ms require active c^culation on the part of tl^e test taker, they 
differ in the kind of iri^nnation requested and the type of process ' 
required. In summary, it sfeems that' a particular responsfe set majr^ be / 
established by a series of items Requiring similar information processing 
or reasoning skills.' It is suggested that subsequent items should not 
require widely different skills , unless the test is designed to reflect 
the ability to select appropriate prpcessing strategies. Although a 

-test like Arithmetic Reasoning" may hkve this purpose, there are other 
tests that do not. Care should be taken to ensure that wheft items ^ , 
differing with respect "to type of reasoning processed required are In- 
cluded within a given subtest, the- varied jftetos were included by _design_ 

'and are necessary to the. purpose of the test. For example^a varied 
sample of o reasoning .processes would be, require^d in the-.case of summation 
scores where higher s^cores are intended to me^ moi;e ability /mastery ^ 
of mathematical principles. 



7.2.1. Too much- info rmatfon . ,In some items, the examinee will 
encounter a mismatch between the amount of information available and 
the amount needed to solve the problem. A test taker taav^nticipate ■ 
' thdt all the Information giv^ in a problem is .to be use^n its 
solution, only to find oiit later that some of it is irrelevant. This 
situation may or may not be desirabjLe' depending on the tester's purpose. 
If the purpose is to assess thfe examinee's ability to ignore irrelevant, 
Information, including such information is quite appropriate and, in . 
fact, necessary. This practice" is commonly used in the development of 
the so-called data sufficiency itans found in. a tiumber of well-known 

If, however, the tester '^ purpose is to a~ssess the ability of the 
examinee to reason f rom rej^nt information, then it seems desirable' 
to Include, only 'ipformation.^required to solve the problem. Consider 
the followlng,-iCem: ^" 

•• - ' il 

Two cars started from the same , town af 

the same time. One , car travel&d, 50. miles ^ ^ 
■ •'" ~ an hour for ^ hours'. The .other 'car traveled ' . .* 
60 miles an hout for 8 hours. • Ho\j many miles ^ 
, farther did t;he second car firavel? 

A 10 . ^ 

\ B 40 • • ^ - . . 



C . 200 
.D • 2;80 , 
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Giving information about the starting time^ of the* two cars leads the 
•examinee to expect that the solution will in some way involve the 
arrival and departure times of the cava, Hoyever, the"" information given 
in the first sentence of the lietn is unnecessai^ for the problem's 
solution; some would regard thW information as completely extraneous. 
In essencfe, the inclusion of such irrelevant infprmatic^n violates a 
principle of language uaage that Grice (1967) has labeled the *Maxim of 
Relation? a principle ^ssiimiijg tjiat oiiay relevant, information is giv^in. 
. VlolAtioif of this principle riot <mly faiis^ to meet basic test construct!^ 
principles but the increased verbiage has particularly devastating ef £eS 
on poor reiderg and normally vpibr test perfohners .prevalent among many 
different ^ocioeconomic^roitrs ./ A sociolingulstic application of this 
principle td testing wcyld suggest that considerable effort; 'should be 
.taken 'to avoids inclusion of ^irr-elevant information in test items. ^ 

7.2^.2. Igstf^ficient information . In the! example Tjelow, jwhich 

deals with lump-sum versus moiit^ily payments, Jtt is possible from* the" 

way the fact^ are stjated to suppose that .the liimp-sum figure and; the 

-^nonthly figure dre equivalent;,, unless"^the test taker stops feo calculate 

their relationship . . ' ^ \ 

-> - • 

M insurance policy "costs $7 ,70 'a month., ^ ' 
or $85^0a/year. jHo.w much money will a 
pers,on sat^ each year by paying for a ' ' 
year's insurance at one time? 

A $ 5 .00 - . * 

B $ 7.40 • 

C $ 8.40 - ' • ' 

. D ; $92.40 / , - • 

Notmng that is overtly "stated makes it clear that the annual rate is 
J-^^e than the monthly rate, and test takers from low-incomfe'backgro{lnds 
^^kr^unlikely tp be aw^^ that such^ is usually the case. A simple 
rewording of the item would add to the verbal content but make it * 
-iaore acceptab^le.^ ' ^ • ' ; 

The most serious problems of insufficient info.rmation involve those 
items that allow' legitimate altema|:^e tracks of reasoning and lead to 
answejps which are- scored As incorrect? For example: V 

Gasoline costs 20 cents a gallon befo;:e ^ 
taxes.' There is a 20% road tax on each* , • 
jgallon of gas,» as well as a. 5% city tax 
^ and- a 5% state tax.' Whajt is the total 

cost of 8 gallons of gasoline? ' - - . 

. A ,$2.08 • ' . . ' ' . 

^ B $2^40" • • ' , - ' 

C $2.80 ^ ^ ' 

D ^.80 
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. This item allows for the 'computatio^ of ta-xes ^ased either on an^ 
accelerated figure or on a constant base price. Using ^the accelerated 
approach, the 'examinee would take .20%. of the base, price- 4(20 cents) ''and 
add ^he computed tax (4 cents) to the bade P^ce^* Additional tSV^s 
would be applied to the new total at each step.' Although usih^fthis 
accelerated procedure may not be strictly correct, the current^ lise of 
theiever popular surcharge might make such a choice seem reasonable it© 
many examinees/ Since the itent is intended to assess arit^imetiq, reason- 
ing, n^t specific ^rtowledge of ^tax computations, the apparent ambiguity 
should^pi?obably be Rectified by including additional information;' 

7.3. The Prindiple of Formality .... \ 

This principle 'states that the greater the distance "between the 
variety of English familiar to an individual ^d that used in a test,;, 
the greater will be the potenti^ linguistic ^-fficulty for tl>e exam- 
inee. The problem is more se^^iis when theryare marked dif fe'rences 
'between the variejty of language an ^dividual speaks .and ^the variety 
which, he must read than when an. individual J ^ sppken usage more nearly 
(approximates the written form.* Nonstandard /Spoken language varieties 
are most characteristically employed by .in^equent readers (who are- ' 
fiffen of lower socio-economic class backgrotind) and in informal settings 
biven that most tests are written in a relatively formal, variety of 
standard English, the principle states th^g^^ level of linguistic 
'difficulty would tend to be systematicall^^reater fox? individuals from 
lower socioeconomic b^kgrounds and backgrodhds where English not . 
^ the primary slanguage than for those from ro^dle-class backgrourils . 

The type of language used in testig^loften has certain peculiar- 
ities that distinguish it from the lai^^e of everyday conversation 
and even f rom'* th€|-iormal standard English found in other types of writ- 

For the most {)art, these differences are in sentence structure 
an^ vocabulary choice, and they "^nstitute^ probably the more serioUs 
and more correctible sources of poten'fc^l bias in the exampl^ test 
battery.. ^ For example, a sente^ttfe ^Jt^M^ following, not unconmfoa in 



standardized' tests, would b^jtelativSly rare in spoken English'^ 



'when measuring an unknown voltageSd^^th 
a voltmeter, tfi'fe proper precaution to ; !*• 

ytake is to* start with the ... . . 

No reduction in clarity or diminution of cmitext would rfesult from re- 
working this item' to read as follows^ ' , , 

' In measuring a voltage with a voltmeter, . 
you should :be careful to, stajrt by . . . 

In this rewording, the vocabulary and the^ syntactical arrangement con- 
form more closely to natural conversation, „ -thereby eliminat;Lng the. 
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^ ' barrief resulting from unnecessary formality. The content ^of the^^*; 



question remains unaffected. ' ' If'' 

Throughout the test, more formal lexical items are consI§.tently 
chosen over more familiar ones. Worcte like locate (instead 6% f ind) % 
obtain (iJnstead of get) , fails to Cfor cfeesn't) , and approximately ' ' 
(instead of about ) all refjjf^ct .such choices • Though certaili, .ottier items 
m^y appear to be innocuous,, further investigation 'reveal^fhat , there ' 
inay be subtle sha(3es of meaning involved which could lea4> to futther , , ' 
jBisunderstanding by some Xest takers (Gordon & Lakoff, 19^1; Gireen, 
1973); ' ^ 




7.4. The Principle ^...Jledundancy 



' The principle of redundancy states that the redundancy-reducing^" 
rules characteristic of written English may cause difficulty- for exai6- 
ineefe whose familiarity with formal written English is limited. These 
rules serve to reduce redundancy by deleting information that is 
identical' to information previously stateH-,'by converting relative 
clauses to more abbreviated, constructions^ and by introducing various 
references to previously .^mentioned material . ' , 

* - 

Jor example, the deletion of .the preposition bj^ in a sentence such 
as "Bill makes ten dollars a week (by) washing ckrs" makes thfe sentence 
slightly less clear (though perhaps more, conversational) Similarly, 
the use of a reduced clause construction- in reference to a container '' 
that weighs **1,200 pounds empty" is less clear than the full construc- 
tion "l,2ffO' pounds when it is empty." In. other iitems, the kinds of 
reduction allowed by English grammar in comparative sentences may have been 
used to the potential disadvantage ^of some test takers. Mhen reduction 
is applied to comparative sentences, ambiguity may be introduced and ^ 
tomprahensioii reduced. For example, a sentence^such as, "John has helped 
more people than Bill" is ambiguous. It can mean "John has helped more 
people than (just). , Bill, " or "John has lielped more people than Bill has 
helped." It would be bett^er'to give the fuller form, ''^ohn has Tielped; ' 
more people^ than Bill' has helped , " if < that- is the intended ' n>H|^hg . 

' ~ t-^^ . 
' The item below begins with^ a, complex sentence to which a syntactic; ' 
deletion rule called "gapping" has* been applied. f 

Th^ rtaning time of a 6ovie is 1 1/2 hours, 
of'tffe news reel 10 minutes^ of the Cartoon 

8 minutes,^ and of the coming attractions ' , 

... 7 minutes. At what' time would the entire ^ 
'^^ show be over if.it began at 6:50 p.m.? * ' . 

«^ A 8:05 p.m. ^ ^ ^ 

* . B 8:25 p.mW f 

t 8:55 p.m. . ^« . t 

D .Some other time * ' ' 



tJapping allows redundant material to be deleted in a series of similar', 
condfcructions after it has'' teen stated in the first member of the series 
(e.^, "John Qrdered fish, and Bill (ordered) liver." Gapped sentences . • 
may be quite dlffictllt. tcj follow; a very substantial reduction, in* 
difficulty might be achieved in this^ item by giving the full ungapped 
form. In othei; ''instances , however, gapping may^^he effectively applied; ^' 
to reduce passage length. Inclusion of redundant material often helps 
slow readers , especially— lip readers , and individuals less familiar with 
formal Eng).ish to. understand the content of the test itfems. *. 

7.4.1. Reading level difficulty . In the abpve paragraphs,, it will^ 
be noted that the proposed revisions involving redundancy-red^ciiig i^ules 
quite often require an increase in the length of the sentence. Sina^ ^ 
some of the. more traditional scales for measuring reading difficulty 
(such as the one proposed by Flesch (1951)) view reading difficUltjT as 
a function of sentence length and the -number of polysyllabic ^^ds, one 
might question the effect of redundancy-reducing tevisions on reading c 
difficulty. We suggest that perhaps 'Flesch ?s ^conclusions are more 
relevant to some situations than others. For example, ^ome item writers 
may employ* a style relying on complicated grammatical Constructions and ° 
difficult: vocabulary. - To thes^e^^^riters-, Flesch *s ^proach clea?:ly . 
offers a guideline for remedying their stylistic dWects, especially , 
when the audience is homogeneous and relatively proficient in the^ lan- 
guage- txsed. But the enlisted military selection tests place Remands 
'on item writers that 'are much mare rigorous, perhaps requiring^ -other ' 
measures ♦ than those suggested ^by Flesch 's. Among these other measures 
might be the principles suggested in the previous^ s^ection. - - 

8. Ekperimental Application of Sociolinguistic 
^ ' - Principles to Woird Problems ' 

The development of new' principles or const rue ts'^such as those evolved % 
-from a^ sociolinguistic context raise numerous qued,t^:ons concerning their 
ytiiity, methods of applicat:ion, the rel4-ability- <^:^alidity with which 
their elements can b^ discriminated^ and pei^haps their influence on increas-'^ 
ing the clarity of meaning in written statements (test items) . The lack of 
-empirical data on these questions leji to the performance of a small pilot \ 
study to obsein^e basic rating chara'cteristics, response ..pat terns, and 
influence of type of subject master on a rater *s judgements. The three 
persons who assisted in the developmient of and were thoirdughly famittbar. 
with the four principles, i.e., pragmatics, processing, formality, and 
•tredundancy, were^. requested, to rate the items in two sub-tests of the 
stople tests* 1 ' " 

The judges were asked to indifcate whether j3^r-iaQ{_specific terms 
violated the principles and, if so, which principles wete violated. 'The 
analysis lindicated- that on one sutifest judges, agreed wil^ each' other ^ 
reasonably well. They agifeed upon (Ij) 'the, items, which violated, sociolinguistic 
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principles,' (2) the severity of the violation /and (3) the particular 
principle involved. There wa,s a noted lack of agreement, howevej, between 
the judges on the other subtest with very few indications ^by two of the .v 
Judges of a violation of sociolinguistic* principles . 

The degree of relationship found between judges^on'one of tlie sub--tBsts 
suggests that the four principles can, with further experimental irefinei^nt, 
be used to identify potential sociolinguistic ^problems in items. 



1 \ 



6.1"; Future Applications 



4 thorough application of sO.ciolinguis tic*' principles to test develop- 
mentjwauld require a mOre extensive effort than the attempt made in the 
present study. It would entail the following ^steps : (1) a set of 
materials would be examined by socloliriguists, , who would tfien formulate 
ar set of principles and adequate -rating scales for dealing with thp language 
'Of tests (2) the resulting principles would be ^plied to a new set ot- 
materials to product :tests free from the previously desgribed defects; (3) 
^unrevised^ but otherwise identical tests .would also be assembled, and the 
*two sets of tfests would be administered * to ^afldom halves of a group of* 
exmninees. Differences in^ the test score performance ofi examinees uiider 
each condition -would be noted and subsequently validated^gainst a relevant 
^criterion. These procedures should be repeated* using different materials, 
groups, and types of subsequent validating performances. Different socio- 
linguistic experts could- also be BiJ5)loyed to develop different principles - 
to be examined. - Clearly, the number of possibilities would preclude an 
all-^inclusive investigation. J This should not ^./however, discourage ifiore 
D^S^sT eifortBv^-^, ? - : * , • ' \ ' 

^ ^ ' . : ' - 'I- 

9. ^ The^Word Knowledge Subtests Synonyms 

The Word Knowledge >si£b test is the only tesrin'the example battery 
specifically intended to assess a verbal skills |[f any of the tasks to 
be performed in this subtest ^re not related t5 wotd knowledge, then 
the content validity of the test might be questioned. For a sociolinguist, 
an attempt to establish content validity would entail framing a concept 
for the term "word knowledge" and then determining , if the items satisfy 
the concept. An even ;^re appropriate method would Itivolve writing tfest 
specific'dtions implied by the concept,. Since we must deal here with 
an existing test, the latter approach' is not possible. 

* Doing weil ori the Word Knowledge subtest requires at least three 
quadities: . th'e ability to read, a notion of n^aning and synonymy, and 
a knowledge of a'sUfficient number of words tested^. Other'more subtle ' 
skills one mighty wish to test Include: - ' 

, ^'1. Knowledge of -syntactic constraints (i.e. , knowing into what 
sentence structures ' particular words.' fit) . ' , - , 



2. Knowledge of stylistic constraints (i.e., knowing for what 
linguistic and social settings particular words afe/appropriate. ^ 

, . Knowledge ot-semar\tic ^constraints (i.e., knowing with what 

^ other Ideas par'ticular words can be ^ised) . ^ ^ ' - % ^ 

4'. Morphaiogicai i ilf ormation (i.e., knowledge of word origins ' • ' 
and ^derivations) . - ' . ^ - 

5. Knowledge of relationships to other words, <^ ' 

6. Knowledge of the presuppositions impl;Ltit in- words, and 

their implications. , • • ' ♦ 

> ->* 7. Knowledge of the pronunciation and spelling oX words. ^ J 

' ^ The Word Knowledge subtest does not seem to demand all seven 'of ■ the 
^owled^^s listed above, although each might be helpful, ' Thi^ .suggests 
that there is no full assessment of the examinee's word knowledge, nor 
was bne intended. . 

But-therfe are ^problems encountered in the use of the synonymic form 
beyond the limitations previously 4escribed* .One type of mismatch is s^t , 
up iq^the directions in sub task 2 of the test where the candidate is 
- asked to decide' which choice "most nearly means the same" as the stem word; 
In an example the wording shift, incorrectly and unfortunately, to "means 
the" same." tlearly the former more accurately reflects the task than the 
latter, since very few words are exact synonyms, though they may be judged 
approximately so. 'Mismatches also occur between stem words and correct 
alteTuativasj^^thr^e of the toost f requent kinds o^"! such mismatches are 
given below, ^^"''^^ ' ' . 

h ^ f ' 
9.1. ..L^ck of Semantic Equivalence 



/ In the Word Knowledge subtest, knowing v^ich of .the^ a^ts^i&na^ves. 

^carries the same semantic confeht is vety helpful. 'Experllenc^.^feaches 
that one-to-one equ|jfalence of this kind rarely, if ever, exists. Even 
though a limited ser.of experiences may yield the judgment that a pair 
of Wcycds are synonymous, only one relatively minor experience is ;needed 

, to -disprove the judgment. (See Binnick^. 1971 , 1972 and Lakoff (1972) 
for just such instances of disproof of snynoymy.) Even in such a close 

•pair ^s sweat/iperspiration , the words \are^ not equivalent in all situa- 
tions; ^horses "sweat,* while people perspire. A man lives by the sweat 
(tiot th^r fferspiration) of his brow. The differences are also apparent, 
in humour triads such as:*' I am firm. You are oBstinate^ He is ^ 
p^g-heaSed'^fpol . ^ ^ ♦ 



^9-2- .Scalarity 

♦ ^ '■ \ 

r ' Language users of ten behave' as if*",an implicit ranking procedure 
operates for^ many word^paifs.. Words that refer to approxii^ately the 
same objects ca^i differ in relative strength. In the follqwing - 
examples, for instance, a weaker word .is used in the simple sentences. 
The assertions in these sentences are' made (stronger if the phrases in 
parentheses ^re added: • \ ' 

N 

-She's , intelligent;^ (, in fact, she's brilliant) • . • 

The children .are happy, (What *sr more, they're ecstatic) < 

M say^this land is pretty* (, even'beautif ul) . 

Noxe that reversing the order of intelligent and brilliant , happy and 
ecstatic, and pget ty ai>d beautiful (that is, switching to a stronger' 
first word) produces a particular type of verbal joke, , * 

7 

9*3- Generality . t . * * ' » 

A second type of difference between the* stimulus and response 
words codcerns the distinction between the general and the particular. 
Related words, especially those thdt are mutually substitutable in at 
least some situations,, can be ^ranked in two very general kinds of * 
hie tarcKical "structures 6cf, Bever & Rosenbaum, 1970), The Jollowing 
setitence frames can be' used to jietermine' if either hierarchy .relates 
' to a given pair of wards : ^ ^ " 

!• A - is a part of a ^ , 

A is a kind ,of " ^ ' ' 



\ - For words '^ot her thay iiouns, minor modifications o^ the frames will • 
^ yieia the. correct judgments,^ The first "blank ^ in each frame will, of 
course, be filled by the Jess general £erm of a pair, FdTesexample, « 
^quiet-calm and blemigh^efect are such pairs, ^the' first pt^m iov^reach^ 
being contained within the hierarchy of the more general second item, 

' 10, ' Perspecti ve and Prospects ^ 

' ' The foregoing sections have presented a number of sociolinguistic 
cpnsideratiops , about the use of language in test construction, and have 
raised/a number of iss^ues needing critical examination, TJie present ; 
♦ section will review some of these issues from a psychometric perspective 
and then suggest steps that might lead td an appropriate use of socio- 
linguistic. techniques ^in testing, \ ' * • 
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10.1. Perspective - ^ 

m testing, as in many olEhe^^areas the social sciences, pyacgce 
of the art , is difficult because everyone .considers himself an "expertV" 
Therefore\' there exist many commonly held beliefs that are unsupported, 
.or indeed ^even contradicted, by evidence. Frequently this evidence is 
l&owa by only a small group of researchers, while the bel ief is popularly 
accepted and widely hel<f . A few such Beliefs are v presented and then 
qualified below. ' . ' . 

* Belief One Test language is unnecessarily difficult . If simpler- 
language were used to pose questions > examinees unaccustomed to academic 
English would perform better . This contentiqn has been .tested by 
"Bomstein and Chamberlain (1970) who, noting tbe difficulty of language 
.in -tests of soci'^l studies achievement, rewrote test items using simpler 
language. They found highly similar performances for the easy and h^rd 
lan^^ge versions, a finding that is supported by. a similar study ^ 
(Livingston, 1973). - i 

- Belief - Two — Psychological tests are not fair t o gtoups who achieve 
low average scores . This belief ignores the need to relate scores to job 
performance. The military services' extensive programs of research and 
development canfirm that low scoring personnel may realistically 
expected to perform less well on the job than Ihigh scoring personnel. 
' ' - ^-"i 

Belief Three — Psychological tests! may l)e valid for m ost people 
butare not related tp the performance ef minority group members. The 
proponents of this .belief have been so influential that it is mentioned , 
in the guidelines developed by the Equal Employment Opportunity CoDsmission 
^Guidelines in Employee Sele^'tioiv Procedures , 1970), -and. Indeed, there 
pay be groups for which the belief is ^rue. The extensive research 
conducted to date, however, shows tests to be equally ^valid for minority 
and majority groups. Boehm (1972) and Schmidt et al. (1973) have surveyed 
the literaturjB of validity dif;£erences for Blacks and Whites and have 
found that:> except in a few studies characterized by small samples and 
'inadequate controls, subs tantiallys lower validities for minorlfcy groups 
have not been demonstrated'.* 

Belief Four — People who are unfamiliar with tests are at a" dls- ■ 
advantage. A little coaching on test taking would Improve their scores.. 
-If this belief were true and If score gains were .reliable* many examinees 
-would. be expected tP benefit from coafching.' Unfortunately, such is not 
sthe case .c V In thre4 studies .sponsoreti by the College Entranc^=Examinatlon 
Wrd tAngoff, 1971) ». coaching was a£tenjf>ted to Increase test scores. 
These attempts, made at a hlgh-prestlge priVl^e Ins^tltiitldn (Dyer, ^ 
1953), at- a public Institution (French # DearVl959),, and at a rural ' 
■school in a depressed! area (Robei^s ^& Oppenhelm, 196^)', were not aucc^ss- 
ful in. raising total test scores.^ It is currently felt," however, that 



coaching might- help reduc^ anxiety for some examinees^, and might improve 
performance on certain specialized item types. Any such ^core gains, 
^however, are expected' to be neithej large nor pervasive. 

Although the existipg evidence ydoes ncTt support these beliefs, 
some of them are undoubtedly in^licitly involved certain, of the 
^ jlssues raised in the preceding sectiorfe. In evaluating the discussion 
^ In these sections, the^refore, the following considerations should be 
' kept In^mlnd:^ . . ^ « 

fx* 

1. The sociolinguistic. principles and evaluations developed in 
this report result from a first , attempt , on a United amount of material 
and should not bef judged ak a flig.shed or final example of scientific 
application. \ ' 

2. The principles and evaluations are not to be regarded as 
universally true, but applicable only in certain situations . ^ \ 

'3. The principles aja^^evaluatlons are' only a small part of the 
contribution that might eventually be made by the application of socio- 
linguistics to testing. 

* 4. The princlplea and evaluation? are not uniquely the property- 
of sooiolinguists; many of the items Identified as defective by socio-- 
linguists could also have been so Identified b)^ test constructors * for 
- , similar reasons . ' " , 

° . ' - ^ . 

♦ The^systematic development -and application of sociolinguistic 
principles to testing will require much more precise f otmulatlon' ^nd 
"testing than has occurred to date. Some* steps in this direction are 
suggested b^low. " ' ^ 

10.2. Prospects ' / , ^ - » ^ 

Ttife' Application of sociolinguistic principles to test construction 
woiild occur in setting test specifications, writing and reviewing *tests 
and items, and developing interpretive materials. Th^ actual principles 
should, to the extent possible, be f6rmallzed,i and the"' effectiveness 
of their application should be researched^In light of the plethora 
- of beliefs that have been substantlate^'""o^y occasionally, reseatc^Ji 1^ 
particularly important ip applications' dealing Wifh* population subgroups . 

, 10.2.1. Specifications . Test or teat battery construction requires 
. adequate ^^test specifications, reg^^dl^ess of the putpose and contex^f 
testing, xn some situations, such as academic selection, tjiere 'have 
been literally himdreds, perhaps thousands, of validity studies. The, - 
, most effective predictors are well knowA 'and can be^speclf led in advance. 
But many situations' encountered in the military services require the 



early, identification of those who will perform ^^11 on some relatively 
unstudied t^sk. In' this .case, a variety-of item types must be tr;Led ' 
to define those appropriate for usg^n a selection battery. 

TesCi and atem'' specifications slfould include item type (e.g.,> 
"analogies^, antonjrms), content (e.g., verbal^ability , automotive infor- 
mation)", statistical specifications (e.g., percent, ^passing each item 
and miniiaum acceptable item-test correlations^, and other important; 
factors such as' the number of^items, testing time, physical fortoat, anS 
choi):e of directloj^.. In choosing an ^xistirtg set of directions or in 
writing new pnis,' a tester could usefully apply sociolinguistic r 
.priAciples to make the f ollowing.. decisions : what kind, of directions 
^(oril ar written) to use, what level of language is appropriate, how / 
much flexibility should be given to test administrators, how to us^ 
imperative in giving instructions, and what level of previous exposure 
to testing to assume for various groups, of exaaflinees. At present, • 
decisions with respect to these various aspects of directions are based 
primarily-'oh logistical convenience, on existing standard^ ractice, and«^ 
on the .assumptioh that identical procedures accpmpMsh equal exposure. 
Better. specifications <ns.better support /for the existing specifications 
for directions, as well as other asppcts tests, might result from! 
'a sound research program. . ' 

" . ■ " ' ^ - ' ' ^0 - ^ 

10.2.2. Item writing . The item -writer could have available a set> 
: of research results and principles that could" be used i| formulating^^ 
'items. Some decisions regarding item format would, of'^cour^e, have } 
:c^>een made when tes*t specifications were established. For jixample, the 
•use of extraneous or insufficient information would be a niatter of choi 
in some item types, such as those In' which the examinee must determine 
which of several given reasons are sufficient to establish a stated ^ 
. conclusion. But inadvertent extraneous information might, alserbe; use- 
- fully included in arithmetic items. It would, 'therefore, be helpful 

to an item, writer to know whe^n he c^uld legitimately complicate^ the 
^problem pOsed by the itemt and when\e could be ,feandicapi)ing a group 
whose subcultural expectation of test taking is ithat all of -the infor- 
mation given mustjDe used! The item writer should' have at hand sotne 
indication of the effectiveness .of attempts to remove such expectations 
through modification of 'directions. , - • ^ > 

The item writer . must, also confront directly * the^problem of 
writing dSffeicult items, items in which the' difficulty arises from the 
, nature of , the problem posed, not from the language in \d|ich the prohlem 
is statedlf Perhaps sociolinguistic research^ could leacf^to a^separatlon 
of languag^ dif^culty and problem difficulty so tjhat one^- co^ld learn - 
to pose, hari problems dn easy language. 

iO.2.3. Item review .- As with many olhef creative' acts , the 
^itlng.of Itest items can proceed in two steps: in the first,*- the 
central idek of the item is conceived W put on paper; in the second 
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the rough idea is developed and polished,. The principle of pragmatics/ 
is one^that could t^e applied in this second stage, since it implies 
that an otherwise appropriate problem could discriminate unfairly if * 
-put in the wrong bontext. The item "reviewer should, therefore, be . 
relatively free to., consider backgrpund infsrmation related to the' 
.language and- culture of ethnic, religious, and sex groups. He' should^'* 
also be attuned to ^he possible implicatiorns that sue* information, 
' * has for test" items. EwntuaUy, a' checklist of principles could be 

devel^p^ for use in e#lu^ing each item for linguistic' and cultural. 
- defect^sr, . . . ^ . . - 

• , ' '10.2.4. Test review . After assembling the Qst, 'the items and ^ 
directions shpuld be examined. At this step^ the .principle of ] 
processing would be applicable since 1| deals with items in combination. ' 
This principle emphasizes that answeijlhg it;ems having similar content^ 
.may requir.e different jbgic^l processes. The principle, as stated by"*' ' 
the socio linguist, suggests that processes should not be mixed. While", 
the tester might not be averse* to mixing such processes, he would 
undoubtedly prefer that it be done intentionally. One aspecj: of the 
test review, then, would.be to check aijd evaluate possible contraflic- 
tions of the principle of processing. . • ' / 

^ ' / " ^ ' ^ .1 

10-2-5. Pretesting . Goo'd' testing practice requires that new items 
^e administered on a trial basis so that unsuspected defects can %e i 
noted. Some major testing organizations conduct programs o'^f pretesting 
and maintain test files, that contain a record of each item's statistical^ ' 
performance. In ^ight of tKe previous discussion, it sfeems desirable 

^ to keep the results of statistical analyses of items on population sub- 
groups. It should be emphasized* that group >y item* interactions , not / - 
overall group differences, wouldu]5e the most info'rfnative indic;ator of ' r 
the quality ^of items. AngOff and Ford (1973) have long asserted that 
such confparisons of item difficulty in groups ,could be used to identify 
partisularly troublesome items. For example, certain tt)ol knowledge ^ 
items might be more difficult, on the average, for women than for- men, 
^ince some of ^he tools mentioned are seldom four^d outside factorlel , ^ , ' 

^which are -traditionally men's domain. More common stools likely to ' 
tb^ found in home „workshbps might be inore equally recognized by men 
aM women. ' ' - i * 

10;3i, Research . It seems 4ikely that the full bei^ef its of soqlo- / ; 
lingu'istics in testing will require an extended perio.d* of ' developji^nf, 
application, and evaluation of principle^ and' information. Its organiza- ' 
tion, mission, and access to diVerse populations makes the military - 

^service better suited to carry -out sych^a program than most oth^r > ' ' 
estrablishments. Military personnel rgsearch'in '^the application of socio- 

.linguistics to testing could produce results that have value not only to- 
the military establi^hm^frt: but to 'industrial and educational^ organizatiOnaL 
This, of course, assumes that the discipline has the potential and that 1 
research results are disseminated through apprjDprJLate professionai«>joumals 
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Although a complete formulation of a research "program of this' nature is^ 
beyond the scope of the present paper, some aspects, of^uch a program 
are gi^eri below. . " • « • . 

a ' Ibls.l*. Some research topics . Developing a research program, that 
is both cowrehensive and relevant. to the requirements of the military 
establishnfent goes beyond the resources ^ and scope of this paper, ^but 
some topics- can be' listed. Clearly, the^research required to implement , 
the development aad application of sociolinguistic principles to the 
areas identified In the previous section must address a number of issues. , 
Some of.^the- areas that sociolinguists have felt might be usefully 
Investigated are listed below: ■ f ' 

~ the inclusion of extraneous information in rgasoning 
items , ' - ^ ' • ( ' 

— the degree to .which, the context of reasoning problems 
is appropriate to s|>ecific subcultures, 

• — dJie use of redundant language in test items and directions 

— the changes in the_j:ypes of infofiaation processing 'that ■ 
/ are required by certain items, 

. • I . • . ■ ^ - 

] __ the use of various algorithm-specific d;i.rections on- 
coding speed performnce , ' 

— . the modification of statements of purpose found in ^. 
the directions, 
' \ ■ ■ . 

— variations in the degree of flexibility given to test - 
administrators, and . • . 

f 

' — variations in the level of difficulty of test language 

(e.g., extensions of the Boms^ein and Ghfuaberlain , 

and Livingston studies). 

^ ■* • - - ~ - 

These ideas for' study are given as examples ooly. Additional aseas— 
varying in the importance o| their effects— could be generated 'also. 

■ ' At least two lines of research (jan be identified. One line should 
help establish the size and direction of effects on grotip test perform- 
ance (br on other indicator-s of Impact) resulting-, from systematic ■ 
manipulatiOH^of. the factors listed above f Jhis line 6f research might, 
be viewed as useful in establishing the validity ojfesociolinguistic 
concepts. Such explorator^ studies may not have imibediate application, 
but they should prove useful in establishing whether the observed' data 
behave in a way that ±s consistent \**th the theoiry on which the |:ech- 
niquds are based. ' . ' . 
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Another line of research is directed at, more specific determination 
of the effects of applying spciolinguistic techniques to personnel test 
situati$ms. These effects are reflected in such test statistic^ *as the 

, distribution of item difficulties and in predictive^ validity" coefficients^.. 
This approach is consist-ent with both the goal of changing the align- ^ 
ment of va^rious population 'group^ and the go»l of making this alignment 
more consistent wij^ subsequent performance,,. To make test language 
easy at the expense of^testing relevant,' but difficult concepts will 
tiot be useful. 'Therefore, in addition to understanding the. effects 
-o-f sociolifiguistic manipulapions of tests, investigation^-^mfist also be 
useful in choosing techniques that will result in more effective 

•"personnel selection procedures. 

'f * . • ^ ^ 

10.3.2. Scientific approach . Social scientists, particularly 
psycVologists, long ago4.earne<? that single-factor exjJeriments can lead 
to confusing an'd perhaps^ contradictory resul;^s. They are, therefore, 
aware of the importance of factorial experiments that simultaneously 
vary several factors.* For example, it seems perfectly reasonable to 
suppose^hat the results of changing the motivating effect directions 
;>vfould not be the same for examinees coming from different backgrounds i» 
Specifically, it is hard to imagine that clianges in the directions 
given, in a tool knowledge test could be expected to have the same 
effect' On a person enlisting for a medical job and one seeking training 
in automobile maintenance . Finding the kind and Size of any exiatigg 
'difference requrires the simultaneous v^iatibn of the g^oup tested 

^and the type^^^^ directions given. * ' 

One can see from "the few examples above* that the list of possible 
factors is .too long to include each one *in i grand facftorial experi- 
ment; including only two levels of each of the eight factors listed in / 
the previous section would require 256 experimental grot^s ConHudt- 
-in/ such an ^experiment wpuld be extreipely complex, s^d cei;tainly ffar ^ * . ^ 
,l>^ond anything that* has to date proved manageable in the field <u ^ 
personnel testing, {l prfegxammatic series/of experiments ^imed at the * * 

systematic development, testing, 'and application of socipliijguistically- 
'bSsed hypotheses- related to test performance seems mUch more reasonable. 
This I'st simply to suggest, in the ^ra4it;ipn of scientific practice ,f that 
oiJderly, sequential^ development and experimentation steps be implemented. 

* 10.3.3. Implementarion . The suggested research approach undoubtedly 
requires a sustai4iecf effort.^. Because of the extensive. administration 
-of /the current joint services' selection test. Armed Services Vocational ^ 
Apt:^tude Battery, at the. high school level, it would seem that this. popu- 
lation (and its subpopiilatipiis) would be suitable for research studies 
for which contracts tmsed on either Solicited or unsolicited proposals ' 
might be» awarded. Most o'f the data, ^.however , could come from the- testing 
of incumbent military personnel. These data might be efficiently gathered 
and analyzed bjs^ using. appropriate experimental designs overlaid on 
data collection -efforts conducted In connection \d.th other military 
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personnel research. Jn this manner, data might serve the needs of both 
sociolinguistlG and military perspnn^l rasearthers . , • 

• i. • ' ^ ■•• ' ■ • - 

It is difficiilt to discuss organizational methods to reach a goal 
80 abstract that of "identifying and developing sociolinguistic^ 
: principles f&r application ;to test construction." It is, therefore, 
suggested that perhaps teams of specialists composed of sociolinguistic 
and measurement expiprts could be allowed to inspect existing personnel 
testa., be informied about anticipated development efforts, and be 
encouraged to propose research' projects pertinent to the goal at hand. 
Af tj^t recommendations are received f torn those teams 'and studies completed 
by them, the most probable ^re^a^of development and the moaf: useful 
-organizational arrangements sho^ild become clearer. A' reasonable 
Immediate outlook is for the development of item eval'uation checklists 
to assure proper and careful attention to good test construction ^ 
principles, from, both a psychometric and a sociolinguistic point of view. 
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